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SECTION  1 
INTOODUCnON 

This  report  presents  the  results  of  a  Phase  I  Small  Business  Innovation  Research  project 
funded  by  the  Office  of  Naval  Research  in  topic  N91-297,  Conditioned-Based  Machinery 
Maintenance.  Details  of  our  proposed  follow-on  work  for  Phase  n  are  given  in  ALPHATECH 
(1992). 

1.1  IDENTIFICATION  AND  SIGNIHCANCE  OF  THE  PROBLEM 

The  timely  and  reliable  detection  of  changes  in  the  dynamic  behavior  of  complex  systems 
and  signals  is  a  problem  of  considerable  importance  in  a  vast  array  of  military  and  civilian 
applications.  As  we  continue  to  place  increasingly  demanding  objectives  on  system  performance, 
cost,  and  reliability,  the  needs  for  and  requirements  on  such  detection  methods  grow 
commensurately.  For  example,  the  increasing  role  of  and  reliance  on  computer  control — for  the 
fly-by- wire  control  of  advanced  high-performance  aircraft  and  helicopters,  the  navigation  of 
autonomous  vehicles,  etc. — makes  the  detection  of  system  anomalies  essential,  since  by  their  very 
nature  such  automatic  systems  simply  do  not  have  the  luxury  of  relying  on  the  extraordinary  but 
workload-limited  detection  capabilities  of  their  human  pilots.  Also,  the  cost  of  modern-day 
military  systems  are  such  that  there  are  tremendous  payoffs  to  be  gained  if  the  availability  of  a 
weapons  system  is  improved,  or  its  life  cycle  cost  reduced. 

These  objectives  provided  much  of  the  motivation  for  the  development  of  self-repairing 
flight  control  system  concepts  (Weiss  and  Hsu,  1987)  for  the  in-flight  detection  of  battle  damage 
and  sensor  and  actuator  failures  in  advanced  aircraft  in  order:  1)  to  facilitate  control  system 
reconfiguration  to  allow  mission  completion  (or  at  least  the  safe  return  of  the  vehicle),  and  2)  to 
provide  early  diagnosis  of  problems  that  could  then  speed  up  the  maintenance  process  and  reduce 
turn-around  time. 
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Furthermore,  the  reliable  detection  of  component  damage  or  failure  can  have  a  dramatic 
effect  on  the  cost  of  maintaining  and/or  replacing  an  advanced  military  vehicle  such  as  a  helicopter, 
ship,  or  fighter.  Specifically,  the  total  cost  of  such  a  system  is  so  high  that  the  objective  of 
avoiding  system  loss  due  to  an  undetected  failure  in  some  component  places  severe  demands  on  the 
overall  reliability  of  components  and  their  monitoring  systems. 

Moreover,  this  need  for  reliability  has  typically  led  to  the  adoption  of  extremely 
conservative  maintenance  and  replacement  procedures:  components  are  automatically  replaced  after 
time  in  service  reaches  a  prescribed  limit,  usually  taken  to  be  significantly  less  than  their  expected 
failure  times.  Thus  the  availability  of  advanced  and  reliable  fault  detection  systems  offers  the 
promise  not  only  of  improved  system  reliability  but  also  the  possibility  of  increasing  component 
time  in  service  by  detecting  the  onset  of  problems  and  thus  allowing  “retirement  for  cause”  rather 
than  the  more  expensive  present  practice  of  replacing  components  whether  they  need  it  or  not. 

These  and  a  variety  of  other  factors  and  applications  have  led  to  considerable  research  and 
development  activity  over  the  past  20  years  resulting  in  an  array  of  detection  and  diagnosis 
methods  (see,  for  example,  the  widely  referenced  surveys  Willsky  (1976)  and  Basseville  (1987)) 
providing  us  with  an  analytically  sound,  proven-in-practice  foundation  from  which  to  pursue  the 
new  challenges  arising  as  we  push  harder  on  the  envelope  of  performance,  reliability,  and  cost. 
Moreover,  in  the  past  few  years  significant  new  methods  of  signal  analysis  and  pattern  recognition 
(in  particular,  wavelet  transforms  and  artificial  neural  networks)  have  been  developed  offering  the 
promise  of  adding  significantly  to  the  arsenal  of  detection  methods  and  to  the  range  of  applications 
that  can  be  dealt  with  successfully. 

1.2  OBJECTIVES  OF  PHASE  I 

The  objective  of  the  Phase  I  effort  was  to  assess  the  efficacy  of  wavelet  techniques  for 
selecting /ea/Mrej  upon  which  an  adaptive  classifier  could  base  its  decisions  regarding  abnormal 
changes  in  system  behavior.  For  reliable,  robust  classification  with  low  false  alarm  rates,  these 
features  must  be: 
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•  high  energy  in  at  least  one  case  (normal  or  failed)  in  order  to  persist  even  in  the 
presence  of  environmental  noises  or  transient  disturbances;  and 

•  statistically  significant  in  separating  two  or  more  cases  from  one  another  in  order  to 
contribute  meaningful  information  to  the  pattern  classifier. 

Our  objective  was  not  to  develop  new  classification  techniques,  but  rather  to  modify  off- 
the-shelf  artificial  neural  network  (ANN)  (Lau  and  Widrow,  1990a,  1990b)  technology  as  needed 
to  integrate  it  with  a  front-end  feature  extractor  based  on  wavelet  techniques. 

Wavelets  offer  many  different  ways  to  access  the  structure  of  a  signal  in  time/scale  space. 
The  continuous  wavelet  transform  (CWT)  (Ruskai,  Beylkin,  et  al.,  1992;  Daubechies,  1990; 
Mallat,  1989a,  1989b;  Meyer,1988)  converts  a  time  signal  into  an  image,  from  which  features  can 
be  extracted  using  image  processing  techniques.  The  wavelet  packet  tran^orm  (WPT)  (Coifman 
and  Wickerhauser,  1992;  Coifman  et  al.,  1990)  derives  coefficients  of  wavelet  basis  functions  that 
characterize  time/scale  energy  distribution  in  a  much  more  flexible  manner  than  discrete  Fourier 
tranrforms  (DFTs)  permit.  Variations  on  the  WPT  permit  the  selection  of  subsets  of  an 
overcomplete  set  of  basis  functions  to  find  the  most  significant  elements  of  a  signal.  More  recent 
extensions  to  wavelet  techniques,  presented  under  the  general  classification  of  multiscale  signal 
processing,  create  even  more  options  for  feature  characterization.  Our  initial  goal  was  to  select 
several  alternatives,  and  to  compare  their  performance  and  computational  requirements  in  the 
context  of  whatever  data  were  available.  Because  of  time  and  budget  constraints,  however,  we 
limited  our  comparison  to  CWTs  and  WPTs. 

Because  the  dominant  challenge  in  failure  detection  problems  is  to  identify  a  concise  yet 
distinctive  set  of  features  on  which  the  detection/classification  process  can  be  made  to  depend,  our 
emphasis  in  Phase  I  was  on  the  back-end  of  the  feature-selection  process,  i.e.,  we  assumed  that  the 
full  set  of  wavelet  transform  coefficients  was  already  available,  and  then  determined  which  were 
most  critical  to  good  performance  of  an  ANN  classifier.  Helicopter  gearbox  and  shipboard  pump 
accelerometer  data,  supplied  by  the  Navy,  were  passed  through  a  CWT  or  WPT  preprocessor,  and 
then  used  to  train  ANN  classifiers.  Statistics  on  false  alarm  rates,  miss  detections,  and 
misclassification  errors  were  used  to  quantify  the  performance  of  the  proposed  methodology. 
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1.3  OVERVIEW  OF  PHASE  I  RESULTS 

Phase  I  of  Ms  effort  clearly  demonstrated  the  feasibility  of  incipient  fault  detection  for 
vibrating  systems  not  only  for  bench  test  conditions  (helicopter  gearbox)  but  also  for  mild 
operating  conditions  (condensate  and  fire  pumps).  Remarkable  Phase  I  results  were  obtained  by 
using  a  balanced  combination  of  CWTs  and  ANNs.  We  used  the  CWT  to  select  features  for  an 
ANN  classifier.  The  wavelet  transform  provided  enough  visibility  into  fault  signals  to  allow  us  to 
reduce  the  size  of  the  feature  set  to  10-15  features.  We  used  a  low-dimensional,  conventional 
ANN  classifier  (Widrow  et  al.,  1988)  with  rejection  of  ambiguous  classifications.  We  achieved 
0.000  probability  of  false  alarm,  0.0(X)  probability  of  missed  detection,  and  <  0.04  probability  of 
deferral  (to  a  subsequent  feature  vector)  for  all  three  data  sets  provided  by  ONR.  The  major 
product  of  out  Phase  I  work  is  a  single  methodology  to  identify  robust  features  that  lead  to  these 
performance  levels. 

1.4  REPORT  ORGANIZATION 

Section  2  describes  the  major  components  of  the  technical  approach  and  the  main  results  of 
this  Phase  I  effort.  Section  3  presents  the  main  conclusions  and  recommendations  for  future 
effort.  Appendix  A  contains  an  overview  of  the  CWT  and  presents  some  examples  to  give  the 
reader  insight  into  the  time-frequency  information  provided  by  the  CWT  based  on  the  Kiang 
wavelet  used  in  this  work. 
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SECTION  2 

TEaiNICAL  APPROACH  TO  PHASE  I 

This  section  describes  the  data  used,  the  major  components  of  the  technical  approach,  and 
the  main  results  obtained  for  the  three  types  of  vibrating  systems  considered  in  this  woric.  The 
most  important  steps  of  the  technical  approach  are  illustrated  with  selected  examples  based  on  the 
available  data  sets  and  using  illuminadng  color  plates  (located  just  before  the  body  of  this  report). 

2.1  DATA  USED  IN  PHASE  I 

For  this  research,  ONR  supplied  data  (through  the  Naval  Command,  Control,  and  Ocean 
Surveillance  Center,  NCCOSC)  for  three  vibrating  systems:  helicopter  gearbox,  condensate 
pumps,  and  foe  pumps.  These  data  are  from  accelerometers  that  measure  vibrations  at  one  or  more 
places  on  the  case  of  the  vibrating  mechanism. 

The  gearbox  data  (from  a  TH-IL  helicopter  intermediate  (42-degree),  relatively  simple, 
gearbox)  consisted  of  vibration  readings  (sampled  at  48  kHz)  from  two  accelerometers  (channels  5 
and  6,  oriented  with  and  orthogonal  to  the  bearing  load  zones,  respectively)  mounted  on  the 
gearbox  output  end  for  six  separate  fault  conditions:  no  defect  (ND),  bearing  inner  race  fault  (IR), 
bearing  rolling  element  fault  (RE),  bearing  outer  race  fault  (OR),  gear  spall  fault  (SP),  and  gear  1/2 
tooth  cut  fault  (TC).  This  is  a  subset  of  the  “Hollins  data  base,”  developed  by  Mark  Hollins  of  the 
Naval  Air  Test  Center  (NATC).  The  pump  data  insisted  of  vibration  readings  (sampled  at  50 
kHz)  from  two  accelerometer  triads  (axial,  radial,  tangential)  mounted  on  the  motor  and  pump  ends 
of  the  assembly,  one  triad  on  each  end.  The  condensate  pump  data  consisted  of  eight  data 
segments  that  included  two  fault  types  and  unfailed  data.  The  fire  pump  data  consisted  of  16  data 
segments  that  included  four  fault  types  and  unfailed  data.  Unlike  the  helicopter  data,  which  are 
bench  test  data  with  seeded  faults,  the  pump  data  were  obtained  from  shipboard  pumps  operating 
under  relatively  mild  conditions. 
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Tables  2-1  and  2-2  present  all  the  information  we  were  given  on  the  condensate  and  fire 
pump  data,  respectively.  In  both  cases,  fault  code  0  is  used  to  label  good  pumps  (normal  cases). 
Also,  fire  pump  fault  codes  3 A  and  3B  denote  the  same  fault  type  on  different  pumps. 


TABLE  2-1.  CONDENSATE  PUMP  INFORMATION 


Segment  Number 


1 


2 


3 


Segment  Number 


1 


2 


3 


Pump  ID 

Fault  Code 

RPM 

CP-IA 

1 

900 

CP- IB 

0 

900 

CP-4A 

2 

885 

CP-4B 

0 

885 

CP-2A 

0 

890 

CP-2B 

0 

890 

CP-3A 

0 

892 

CP-3B 

2 

892 

TABLE  2-2.  FIRE  PUMP  INFORMATION 


Pump  ID 


FP-9 


FP-3 


FP-2 


FP-1 


FP-4 


FP-5 


FP-6 


FP-5A 


FP-6A 


FP-13A 


FP-12 


FP-13 


FP-1 4 


FP-17 


FP-1 6 


FP-15 


Fault  Code 


3A 


0 


0 


0 


0 


0 


0 


0 


RPM 


3576 


3585 


3585 


3585 


3575 


3580 


3580 


3585 


3580 


3590 


3585 


3580 
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For  our  test  systems  we  used  only  one  channel,  from  one  sensor — we  deferred  fusion  of 
results  from  multichannel  data  to  Phase  H.  For  the  gearbox  system,  only  channel  5  was  used  for 
all  conditions.  For  both  pump  systems,  only  the  axial  component  of  the  pump-end  accelerometer 
triad  was  used.  In  a  sense,  we  deliberately  made  the  Phase  I  problem  harder  by  ignoring  some 
sources  of  information  in  order  to  demonstrate  the  power  of  wavelet  techniques,  or  lack  thereof,  on 
a  fault  detection/classificarion  problem  more  difficult  than  one  would  expect  to  encounter  in  the 
field  under  more  severe  conditions. 

2.2  SYSTEM  STRUCTURE 

We  adopted  the  conventional  architecture  of  an  adaptive  classifier  (Fig.  2-1):  a  real-time 
preprocessor  to  focus  the  information  about  the  stat  of  a  system  into  a  low-dimensional  feature 
vector,  followed  by  an  adaptive  pattern  analyzer  to  map  feature  vectors  into  detections  and 
classifications.  Our  work  emphasized  the  development  of  the  preprocessor,  using  the  insight 
offered  by  recent  advances  in  the  mathematics  of  wavelets. 


^1> 


TUNABLE 
PREPROCESSOR 
EXTRACTS  A 
SET  OF  WAVELET 
TRANSFORM 
FEATURES 


NEURAL  NET 
CLASSIFIES 
FAULTS  USING 
THOSE  FEATURES 


^  DETECTIONS 

GO  TO  PILOT/OPERATOR 


CLASSinCATIONS 

GOTOMAiNTAJNERS 


SENSORS  FEATURE  8-16  ELEMENT  PATTERN 

DETECTORS  FEATURE  ANALYZER 

VECTORS 


RESULTS 


Figure  2-1.  Incipient  Fault  Detection  and  Classification  System  Structure 


Our  Phase  I  proposal  indicated  that  our  approach  of  choice  was  to  use  the  WPT  (Coifman 
and  Wickerhauser,  1992;  Coifman,  Meyer,  et  al.;  1990)  to  identify  locations  in  time/scale  space 
indicative  of  faults.  This  approach  met  with  less  than  expected  success.  With  hindsight,  coupled 
with  analysis  of  considerable  performance  data  generated  during  the  project,  we  believe  that  the 
WPT  is  most  appropriate  for  problems  where  classification  depends  on  the  timing  and  internal 
structure  of  transient  events,  such  as  classifying  biological  sounds  in  the  sea.  They  offer 
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considerably  less  advantage  in  fault  analysis  of  vibrating  systems,  where  the  signatures  are  of 
relatively  long  duration  and  statistically  quite  stationary.  However,  we  reserve  interest  in  the  WPT 
for  detecting  incipient  faults  in  systems  that  emit  transient  signals  as  part  of  their  normal  operation 
(e.g.,  heavy  duty  mechanical  or  electrical  switching  systems). 

As  an  alternative,  we  turned  to  the  CWT.  Like  other  image-visualization  techniques  such 
as  the  sonogram,  lofargram,  or  waterfall  display,  the  CWT  converts  a  one-dimensional  signal  into 
a  two-dimensional  image,  using  substantial  computational  resources.  Sub-bands  of  the  CWT  can 
be  evaluated  quite  efficiently  in  a  preprocessor,  however.  Therefore,  we  use  the  full-blown  CWT 
during  the  design  process  to  identify  a  few  bands  that  differentiate  among  cases,  and  only 
implement  actual  feature  detectors  for  those  specific  bands — to  focus  the  information  available  in  a 
signal  into  a  small  set  of  features.  This  allows  us  to  find  very  small  feature  vectors  (of  the  order  of 
10-20  elements)  that  nonetheless  yield  outstanding  detection  and  classification  performance.  A 
brief  explanation  of  the  CWT  and  examples  of  the  CWTs  of  elementary  signals  are  presented  in 
Appendix  A. 

For  the  pattern  analyzer,  we  used  conventional  three-level,  feedforward  ANNs.  As  will  be 
seen,  we  succeeded  in  finding  feature  sets  that  are  nearly  convex  and  linearly  separable,  so  we  did 
not  need  complex  network  topologies  or  exotic  training  algorithms.  (In  fact,  we  were  able  to  set 
the  number  of  elements  in  the  hidden  layer  equal  to  the  number  of  output  elements). 

To  improve  performance  we  suppressed  classification  results  entirely  if  the  maximum 
output  value  was  less  than  some  multiple  of  the  next  larger  output  value,  deferring  the  classification 
to  the  next  available  feature  vector.  We  could  trade  deferral  rate  for  false  alarm/missed  detection 
performance  by  changing  this  multiple.  A  multiple  of  2.0  was  adequate  to  eliminate  all  false  alarms 
and  missed  detections,  and  kept  the  deferral  rate  below  4%. 

The  Phase  I  feature-selection  process  used  a  number  of  tools  to  develop  feature  sets. 
KHOROS  signal  processing  routines  (KHOROS  Group,  1992),  on  a  SUN  computer,  supported 
editing  and  preliminary  analysis  of  raw  data  files.  A  custom  Macintosh  Pascal  package  computed 
the  CWT,  and  another  one  extracted  the  selected  feature  vectors  from  the  raw  data.  Excel,  a 
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commercial  package  from  Microsoft,  supported  the  statistical  cluster  analysis.  Macintosh 
NeuralWorks,  a  commercial  package  from  NeuralWare,  was  used  to  carry  out  the  ANN  training 
and  testing.  MATLAB,  a  commercial  package  from  The  Math  Works,  Inc.,  was  used  to  compute 
performance  metrics. 

2.3  WAVELET-BASED  TUNABLE  PREPROCESSOR 

Figure  2-2  presents  the  wavelet-based  tunable  feature  extractor  developed  in  this  Phase  I 
effort.  The  CWT  is  computed  using  the  Kiang  wavelet,  which  allows  one  to  select  appropriate 
frequency  and  time  resolutions  to  extract  from  the  CWT  the  features  of  interest  According  to  the 
signal  characteristics  one  may  have  to  smooth  and  decimate  the  CWT  before  extracting  the 
frequency  bands  of  interest  These  bands  are  then  parameterized  to  achieve  better  feature 
separability.  Because  of  the  properties  of  this  wavelet-based  feature  extractor,  it  is  not  necessary  to 
compute  the  entire  CWT  to  extract  a  few  features;  only  the  frequency  bands  associated  with  the 
features  of  interest  need  to  be  computed.  This  leads  to  a  significant  reduction  of  computational 
effort  if  the  number  of  selected  features  is  relatively  small  (say,  10  to  20).  (Note  that  while  this 
preprocessor  is  currently  implemented  in  software,  it  is  a  good  candidate  for  hardware 
implementation — on  a  Macintosh  Quadra  900  computer,  with  no  optimization,  it  runs  about  100 
times  slower  than  real  time.) 


Irtquancy  f 


Figure  2-2.  Tunable  Feature  Extractor 


As  our  emphasis  in  Phase  I  was  on  the  selection  of  a  concise,  focused  feature  set,  we 
employed  the  visibility  into  time/scale  space  offered  by  the  CWT.  Our  objective  was  to  focus  all  of 


the  potentially  available  time/frequency  information  into  a  small  feature  space,  since  a  small. 
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separable  feature  set  reduces  the  complexity  required  of  a  classifier,  and  hence  the  risk  of  slow  or 
non-convergence. 

Another  advantage  of  the  CWT  is  its  ability  to  isolate  robust  high-energy  features  that  can 
be  readily  detected  and  can  be  suppressed  only  with  a  great  deal  of  external  energy.  Our  approach 
to  feature  selection  was  explicitly  to  ignore  regions  of  time/scale  space  with  consistently  low 
energy,  on  the  grounds  that  whatever  classification  might  be  possible  using  such  features  would 
not  be  robust  to  disturbances.  (However,  we  did  include  one  low-energy  band  specifically  as  a 
disturbance  detector,  where  classifications  would  be  suppressed  whenever  substantial  energy 
appeared  in  this  band.)  We  knew  that  bench  test  or  mild-operation  data  is  invariably  cleaner  than 
field  data  (it  may  not  represent  a  complete  range  of  normal  operations  or  disturbances,  and  may 
contain  artifacts  not  present  in  real  data)  and  therefore  we  sought  features  that  would  serve  as  well 
in  noisy  environments  as  they  do  on  these  data  sets.  Thus  we  are  prepared  to  handle  much  more 
challenging  data,  since  we  address  robustness  at  the  very  beginning  of  the  feature-selection 
process. 

2.3.1  The  Continuous  Wavelet  Transform 

The  CWT  appears  as  an  image.  It  is  qualitatively  similar  to  other  imaging  representations 
of  a  signal,  such  as  sonograms,  lofargrams,  or  waterfall  displays.  The  biggest  distinction  is  that 
one  raster  line  appears  in  the  image  for  every  sample  in  the  signal — there  is  no  windowing  of  the 
data,  and  hence  no  artifacts  in  the  image  due  to  windowing.  Minor  distinctions  include  the  fact  that 
the  scale  axis  is  logarithmic,  since  wavelet  theory  involves  continuous  scaling  of  a  single  basis 
function.  If  one  thinks  of  the  discrete  Fourier  transform  as  the  output  of  a  bank  of  constant- 
bandwidth  filters,  then  one  can  think  of  the  CWT  as  the  output  of  a  bank  of  constant-Q  (ratio  of 
bandwidth  to  center  frequency)  filters.  (The  WPT,  on  the  other  hand,  is  like  a  variable-bandwidth, 
variable-Q  filter  bank  (Vetterli  and  Herley,  1990).) 

Visual  analysis  of  the  CWT  reveals  areas  of  interest  in  vibrational  data.  The  raw  CWT 
magnitudes  include  high-frequency  artifacts  that  can  be  removed  by  smoothing  over  time.  The 
helicopter  data  segments  show  high-energy,  nairowband  features  overlaid  by  short  disturbances. 
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The  pump  data  segments  show  lower-energy,  broadband  features  overlaid  by  regular  impulsive 
disturbances.  In  all  cases,  all  features  are  stable  over  time. 

The  CWTs  presented  in  the  color  plates  referenced  in  the  sequel  (and  located  just  before  the 
body  of  this  document)  illustrate  these  general  observations.  These  CWTs  use  the  Kiang  wavelet 
(fine  frequency,  coarse  time  resolution).  Hue  encodes  the  log  magnitude  of  the  CWT  (blue  =  low, 
red  =  high).  Phase  information  is  ignored.  Time  divisions  are  62.5  msec  wide.  The  frequency 
scale  is  logarithmic,  and  frequency  divisions  correspond  to  octaves. 

2.3.2  Smoothing  the  CWT 

The  two  images  in  Plate  A  are  from  the  first  500  msec  of  channel  5  of  the  normal  helicopter 
gearbox  data  (sampled  at  48  kHz).  The  left  image  is  the  CWT  sampled  every  2  msec — the  full 
CWT  for  this  data  segment  would  be  24,000  pixels  high.  The  bright  red  line  just  below  2,048  Hz 
is  the  gear  mesh  fundamental.  Harmorucs  of  this  fundamental  appear  at  higher  frequencies  (finer 
scales),  although  there  appears  to  be  little  energy  in  the  fourth,  sixth,  and  seventh  harmonics  under 
normal  conditions.  The  data  appears  to  be  high-pass  filtered  with  a  cut-off  frequency  around  1 
kHz,  although  some  frequency  lines  are  clearly  visible  below  this  point. 

Because  faults  in  vibrating  systems  impact  a  sensor  on  every  cycle  of  the  mechanism,  we 
seek  features  that  persist  over  relatively  long  periods  of  time  (many  cycles).  While  the  texture  of 
the  raw  CWT  between  harmonic  lines  is  interesting,  it  would  be  imprudent  to  attempt  to  classify 
faults  based  on  the  structure  of  this  texture.  Thus  for  this  data  set,  and  for  all  other  CWT  images 
we  constructed,  the  image  was  smoothed  in  the  time  dimension  to  suppress  high-frequency 
textures,  and  enhance  the  stationary  elements  of  the  signal.  The  right  image  in  Plate  A  shows  the 
results  of  this  smoothing.  Among  other  things,  the  smoothing  enhances  the  appearance  of  a 
secondary  line  just  above  the  mesh  fundamental.  Also,  some  5  Hz  modulation  on  the  fifth 
harmonic  becomes  more  apparent. 

Note  that  we  do  not  present  this  entire  image  to  the  ANN  for  classification.  Our  goal  is  to 
identify  features  in  the  image  that  can  be  parameterized,  and  to  compute  much  more  concise 
parameter  vectors  from  the  image  to  submit  to  the  net. 
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Note  also  that  the  CWT  visualization  makes  the  feature-selection  process  quite  efficient. 
Having  developed  a  methodology  on  the  gearbox  data,  we  were  able  to  construct  low-dimensional 
feature  sets  for  the  two  pump  data  sets  in  a  matter  of  hours. 

2.3.3  Changing  the  Wavelet  Basis 

The  two  images  in  Plate  B  are  from  the  first  500  msec  of  failed  condensate  pump  data  (CP- 
lA  from  Table  2-1).  The  left  image  is  the  analog  of  the  smoothed  helicopter  CWT.  It  immediately 
shows  the  lack  of  stable,  narrowband  elements  in  the  signal.  Again,  it  would  be  imprudent  to 
attempt  classification  based  on  the  microstructure  of  the  transient  narrowband  features  in  this 
image.  Instead,  we  seek  more  broadband  features. 

One  appeal  of  the  CWT  is  that  it  allows  one  to  continuously  vary  time/scale  resolution.  By 
changing  the  wavelet  on  which  the  transform  is  based,  one  can  sacrifice  resolution  in  scale — which 
is  exactly  what  is  necessary  to  find  broadband  features.  The  right  image  is  the  same  data,  but 
using  a  wavelet  transform  with  1/lOth  the  frequency  resolution  (along  with  additional  smoothing 
over  time  to  suppress  transients  and  textures).  This  image  clearly  shows  the  locations  of  high 
energy  content — and  these  regions  are  surprisingly  stable  compared  to  those  of  the  left  image. 

2.3.4  Channel  Selection 

The  two  images  in  Plate  C  are  from  the  first  500  msec  of  unfailed  fire  pump  data  (FP-3 
from  Table  2-2),  The  left  image  is  from  the  radial  channel,  the  right  from  the  axial  channel.  Lines 
for  the  first  few  harmonics  of  the  shaft  rotation  frequency  (about  60  Hz)  are  clearly  visible,  along 
with  a  faint  harmonic  series  based  at  about  400  Hz,  and  some  broadband  signal  from  300  Hz  to 
1,500  Hz. 

As  mentioned  earlier,  we  limited  Phase  I  processing  to  a  single  channel  of  data  for  each 
equipment  type  (so  that  we  did  not  exhaust  all  of  tiie  potential  processing  gain  on  these  clean  data, 
and  thus  can  offer  ways  to  counter  the  additional  complexity  one  would  expect  in  field  data).  We 
selected  the  axial  channel  alone  for  further  processing  for  the  pump  cases,  and  channel  5  alone  for 
the  gearbox  case. 
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2.3.5  Gearb(»  Fault  Signatures 

Plate  D  shows  segments  of  the  CWT  for  the  six  cases  of  gearbox  data.  Each  image 
represents  250  msec  of  signal,  from  512  Hz  to  16  kHz.  Note  the  difference  in  stmcture  of  the 
CWT  around  the  third  harmonic  of  the  mesh  frequency  (1,935  Hz) — ^both  in  breadth  and  texture. 

2.3.6  Gearbox  Fault  Masks 

Plate  E  shows  the  same  segments  of  the  CWT  for  the  six  cases  of  gearbox  data,  with  low- 
energy  regions  masked  out.  Our  radonale  for  this  is  to  prevent  using  features  that  are  weak,  as 
they  are  easily  compromised  by  disturbances  or  interference  and  hence  do  not  contribute  to  high 
reliability  detection.  The  technique  used  to  mask  these  regions  is  basically  local  noise  floor 
estimation  across  scale,  masking  areas  that  fall  below  that  estimated  floor.  Note  how  this 
technique  highlights  the  significant  differences  in  structure  around  the  third  harmonic,  and  also  of 
the  1,050  Hz  line.  Techniques  such  as  this  morphological  filter  provide  quantitative  evaluation  of 
feature  set  performance  before  time  and  energy  is  spent  training  and  testing  the  adaptive  classifier. 

2.3.7  Condensate  Pump  Signatures 

Plate  F  shows  the  CWTs  for  125  msec  of  the  axial  channel  for  each  of  the  eight  condensate 
pump  data  segments  in  Table  2-1.  The  log  frequency  scale,  between  16  Hz  and  16  kHz,  is  divided 
into  octaves.  Segment  1  contains  a  fault  of  type  1,  segments  3  and  8  each  contain  a  fault  of  type  2, 
and  the  rest  are  good  pumps  (normal  case).  Note  that  the  CWTs  show  clear  differences  between 
pumps  in  pairs  of  segments  1  and  2,  3  and  4, 7  and  8.  Each  of  these  pairs  includes  a  normal  pump 
and  a  defective  pump.  On  the  other  hand,  segments  5  and  6,  both  from  good  pumps,  display 
similar  high-energy  features. 

In  contrast  with  the  helicopter  gearbox  data,  the  condensate  pump  CWTs  contain  wider 
high-energy  regions,  but  they  are — as  in  the  gearbox  case — relatively  stable  over  time. 

2.4  FEATURE  SEPARATION 

Detection  and  classification  become  exceptionally  easy  if  the  clusters  of  features 
corresponding  to  different  cases  exhibit  two  properties:  convexity  and  separability.  In  these  cases. 
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classification  becomes  a  matter  of  estimating  boundaries  to  separate  the  clusters — and  we  can 
allocate  one  element  of  the  hidden  layer  of  an  ANN  to  each  cluster. 

One  never  knows  ahead  of  time  whether  or  not  a  feature  set  will  be  convex  and  separable. 
We  selected  500  msec  of  data  from  each  case  provided  as  a  basis  for  statistical  analyses  of 
separability.  We  used  features  that  essentially  correspond  to  a  few  frequency  slices  through  the 
CWT — energies  in  particular  frequency  bands.  We  selected  the  set  of  bands  to  use  on  the  basis  of 
overall  energy  content — recall  that  robust  classification  is  possible  only  from  features  with  high 
energy  differences  between  cases.  In  the  cases  of  the  gearbox  and  fire  pump  data,  with  strong, 
clear,  narrow  fundamentals,  we  adapted  the  frequencies  to  the  center  of  that  line  (using  the  features 
themselves  instead  of  referring  to  external  synchronization  signals).  For  the  condensate  pump  data 
(which  lack  such  a  stable  reference  feature  and  whose  energies  are  more  dissipated  across 
frequency),  we  left  the  frequency  bands  constant. 

After  collecting  candidate  features,  but  before  training  an  ANN,  we  evaluated  feature 
cluster  separations.  Table  2-3  shows  the  maximum  separations  between  all  pairs  of  clusters  in 
terms  of  Fisher  coefficients — essentially  distances  normalized  to  units  of  standard  deviations.  This 


TABLE  2-3.  PHASE  I  CLUSTER  SEPARATIONS,  HELICOPTER  DATA 


IR 

FE 

CR 

SP 

TC 

Normal 

0.00 

5.67 

10.79 

9.79 

11.14 

11.28 

Inner  Race 

5.67 

0.00 

4.13 

5.25 

13.31 

6.05 

Rolling  Element 

10.79 

4.13 

0.00 

2.07 

5.89 

3.45 

Outer  Race 

9.79 

5.25 

2.07 

0.00 

7.18 

3.29 

Gear  Spall 

11.14 

13.31 

5.89 

7.18 

0.00 

8.61 

Tooth  Cut 

11.28 

6.05 

3.45 

3.29 

8.61 

0.00 

table  was  obtained  by  computing  for  each  feature  vector  the  Fisher  coefficients  between  all  fault 
condition  pairs,  and  then  selecting  the  maximum  coefficient  across  all  feature  vectors  for  every 
fault  condition  pair.  Any  pair  of  clusters  more  than  three  or  so  units  apart  should  be  readily 
separable  by  an  ANN  classifier. 

We  found  that  the  CWT  features  provide  good  separation  between  cases.  Below  is  a 
summary  of  the  main  characteristics  of  these  features  for  each  of  the  test  systems. 
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Helicopter  gearbox  feature  vectors  (channel  5)  contain  heights  of  narrowband  (1/35)  octave 

lines 

•  center  frequencies  adapt  to  changes  in  fundamental  mesh  frequency 

•  lines  were  selected  at  first  six  harmonics  of  mesh  frequency,  plus  two  other  frequencies 
suggested  by  the  CWT  (0.5525*f,  2.7*f ,  where  f  is  Ae  fundamental  mesh  frequency) 

•  feature  clusters  are  nearly  ellipsoidal 

•  minimum  feature  cluster  separation  is  2.07  standard  deviations  (outer  race/rolling 
element) 

Condensate  pump  feature  vectors  (axial  channel)  contain  energies  in  wider  regions  (1/6 

octave) 

•  center  frequencies  are  fixed  over  time  (and  cases) 

•  bands  were  selected  at  octave  intervals  (32-1,024  Hz),  and  at  1/4  octave  intervals 
within  high  energy  octaves  (64-128  Hz,  512-1,024  Hz) 

•  feature  clusters  are  nearly  ellipsoidal 

•  minimum  feature  cluster  separation  is  5.87  standard  deviations  (CP*  1 A/CP-4B) 

Fire  pump  feature  vectors  (axial  channel)  also  contain  both  narrowband  and  broadband 

features 

•  center  frequencies  adapt  to  changes  in  fundamental  shaft  frequency  Qiinited  to  within 
2%  nominal) 

•  narrow  bands  were  selected  at  first  8  shaft  harmonics,  and  at  octave  intervals  within 
high  energy  octaves  (512-2,048  Hz) 

•  some  feature  clusters  show  some  suspiciously  high  correlation  (probably  due  to 
clipping?) 

•  minimum  feature  cluster  separation  is  2.23  standard  deviations  (Fault  3/Fault  6) 

The  following  three  subsections  graphically  illustrate  with  color  scattergrams  the  striking 
feature  separation  for  some  feature  pairs  and  all  fault  conditions  for  each  of  the  test  systems 
examined  in  this  work. 

2.4.1  Gearbox  Separation 

Plate  G  presents  the  feature  clusters  computed  from  3  seconds  of  gearbox  data  across  all 
six  cases.  Feature  vectors  can  be  obtained  every  10  msec,  so  there  are  about  300  sample  vectors 
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here.  This  plate  shows  the  projection  of  the  feature  clusters  onto  a  two-dimensional  subspace 
defined  by  the  power  found  in  the  second  harmonic  of  the  mesh  frequency,  and  at  a  subharmonic 
line  around  1,050  Hz.  Note  that:  1)  all  of  the  feature  clusters  appear  convex,  2)  the  normal  case  is 
well  separated  from  the  fault  cases  by  these  two  features  alone,  and  3)  several  pairs  of  faults  can  be 
separated  as  well.  Other  pairs  of  features  provide  different  kinds  of  separation,  but  all  show 
convex  clusters. 

2.4.2  Condensate  Pump  Separation 

Plate  H  presents  the  feature  clusters  from  0.5  second  of  condensate  pump  data  across  all 
eight  cases.  There  are  about  30  feature  vectors  per  case.  It  shows  the  projection  of  the  feature 
clusters  onto  the  two-dimensional  subspace  defined  by  the  power  near  615  Hz,  and  near  1,024  Hz. 
Again,  note  that:  1)  aU  of  the  feature  clusters  appear  convex,  2)  the  normal  cases  are  well  separated 
from  the  fault  cases  by  these  two  features  alone,  despite  being  more  diffuse  due  to  variations 
among  units,  and  3)  type  1  and  type  2  faults  can  be  clearly  separated  as  well.  Other  pairs  of 
features  provide  different  kinds  of  separation,  but  all  show  convex  clusters. 

2.4.3  Fire  Pump  Separation 

Plate  I  presents  the  feature  clusters  computed  from  0.5  second  of  fire  pump  data  across  all 
16  cases.  There  are  about  30  feature  vectors  per  case.  It  shows  the  projection  of  the  feature 
clusters  onto  the  two-dimensional  subspace  defined  by  a  narrow  band  around  the  seventh  harmonic 
of  the  shaft  rate  and  a  wider  band  around  2,048  Hz.  Note  some  suspicious  characteristics  of  these 
clusters.  The  seventh  harmonic  of  the  normal  data  seems  to  be  limited  by  a  floor  at  42  dB  below 
the  shaft  fundamental,  making  detection  and  classification  of  fault  type  3A  (light  blue  squares — 
FP-9  from  Table  2-2)  exceptionally  easy.  Also,  for  three  of  the  test  cases  (one  normal  and  two 
fault),  the  values  of  these  features  are  exactly  6  dB  apart,  a  relationship  that  is  highly  unlikely  in 
truly  random  data.  It  is  important  to  supplement  the  power  of  data-driven  approaches  with  some 
understanding  of  the  physics  of  the  system  understudy — why  classification  regions  are  the  way 
they  are — in  order  to  gain  corfidence  that  the  classification  logic  is  truly  robust  to  any  artifacts  that 
may  be  in  the  training  data. 
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2.4.4  Artifacts  in  Training  Data 

Through  an  example,  this  subsection  illustrates  the  need  to  select  features  for  classification 
based  not  only  on  their  separation  but  also  on  the  physics  of  the  fault  mechanism.  The  risk  is  that 
data  presented  to  the  ANN  classifier  may  contain  variations  upon  which  classification  may  be 
based,  but  which  bear  no  causal  relationship  to  fault  mechanisms.  For  example,  we  selected  a 
feature  at  the  sixth  harmonic  of  the  mesh  frequency  for  the  gearbox  data  to  serve  as  a  disturbance 
detector.  In  this  frequency  range  (12  kHz),  there  is  very  little  energy  in  any  of  the  data  unless  a 
disturbance  is  present.  Our  idea  was  that  if  a  feature  vector  with  relatively  high  energy  (>  45  dB 
below  the  power  in  the  mesh  fundamental)  were  presented  to  the  neural  net,  it  would  result  in  an 
ambiguous  classification  and  any  output  deferred  until  the  disturbance  subsides. 

Quite  another  thing  happened.  Plate  J  shows  the  gearbox  feature  clusters  projected  onto  the 
subspace  defined  by  the  fifth  and  sixth  harmonic  power  levels.  Note  the  obvious  separation 
between  {Normal,  Inner  Race},  {Outer  Race,  Rolling  Element,  Tooth  Cut),  and  {Gear  Spall). 

The  fact  that  the  magnitudes  of  sixth  harmonic  power  levels  are  so  small  suggests  that  this 
frequency  is  near  a  zero  in  the  transfer  function  between  the  vibrating  mechanics  and  the  sensor. 
The  fact  that  their  variation  is  so  small  suggests  that  the  energy  in  this  band  is  largely  background 
energy,  or  conveyed  through  a  convoluted  transmission  path.  In  either  case,  the  variations  among 
cases  are  unlikely  to  be  caused  by  the  faults  themselves,  but  rather  by  the  process  of  inserting  and 
removing  faults.  An  adaptive  classifier  will  happily  use  the  power  level  at  the  sixth  harmonic  to 
separate  the  Normal  case  from,  say,  the  Gear  Spall.  Only  additional  insight  into  the  physics  of  the 
transmission  mechanism,  or  a  set  of  data  including  several  insertions  of  the  same  fault,  would 
prevent  field  deployment  of  a  classifier  that  treats  this  artifact  as  a  valid  source  of  information. 

2.4.5  Guidelines  for  Finding  Robust  Feature  Sets 

Based  on  this  Phase  I  effort  we  have  developed  a  set  of  guidelines  for  finding  robust 
feature  sets  in  CWT  images.  These  guidelines  can  be  summarized  as  follows: 

•  Be  sure  that  features  are  robust  to  external  disturbances:  we  seek  high  energy  content 
features  to  be  derived  from  morphological  filtering  on  the  scale  axis  of  the  CWT,  with 
narrow  bandwidths  to  reduce  their  sensitivity  to  impulsive  disturbances.  Features  must 
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also  be  redundant  to  exploit  the  correlation  among  features,  and  they  must  be  frequently 
computed  as  permitted  by  the  largest  time  constant  in  the  preprocessor. 

•  Be  sure  that  features  are  diverse;  it  is  desired  to  include  features  across  a  wide  range  of 
frequencies,  for  example,  the  first  six  harmonics  of  important  narrowband  vibrations 
(gearbox)  or  octave  samples  of  broadband  components  (pumps).  In  addition,  it  is 
desired  to  include  one  or  more  low-energy  features  to  support  disturbance  rejection. 

•  Be  sure  that  features  distinguish  normal  from  abnormal  conditions:  for  this  we  need  to 
compute  statistics  (mean,  standard  deviation)  on  each  CWT  bin  and  look  for  significant 
differences  that  will  lead  to  features  with  high  discriminating  power. 

2.5  ARTmCIAL  NEURAL  NETWORK  CLASSIFIER 

For  an  adaptive  classifier  we  used  a  feedforward  ANN  of  the  back-propagation  type  with 
one  input  layer,  one  hidden  layer,  and  one  output  layer.  The  number  of  processing  elements  (PEs) 
in  the  input  layer  varied  with  the  vibrating  system  between  12  and  15.  The  number  of  output  PEs 
also  varied  with  the  vibrating  system,  according  to  the  number  of  fault  conditions,  including  the 
normal  cases.  Because  of  the  convexity  and  linear  separation  of  the  feature  vector  clusters  for  all 
the  vibrating  systems  of  Phase  I,  the  number  of  hidden  layer  PEs  was  set  equal  to  the  number  of 
output  PEs.  Table  2-4  presents  the  number  of  PEs  per  layer  for  each  of  these  systems. 


TABLE  2-4.  NUMBER  OF  PROCESSING  ELEMENTS  PER  ANN  LAYER 


LAYER 

HEUCOPTER 

GEARBOX 

CONDENSATE 

PUMPS 

FIRE  PUMPS 

Input 

15 

12 

13 

Hidden 

6 

8 

16 

Output 

6 

8 

16 

Total  PEs 

27 

28 

45 

For  the  design,  training,  and  testing  of  the  ANNs  we  used  a  commercial  software  package, 
NeuralWorks  (NeuralWare,  1992),  running  on  a  Macintosh  platform.  For  each  of  the  three 
systems,  convergence  to  the  specified  RMS  error  of  the  difference  between  the  desired  and  the 
actual  outputs  occurred  relatively  fast — after  between  5,000  and  10,000  random  presentations  of 
the  feature  vectors  included  in  the  training  set.  Given  the  sinyylicity  of  the  ANNs  used  in  this 
work,  their  small  size,  and  the  excellent  feature  clusters  separation  made  possible  by  the  judicious 
utilization  of  the  CWT,  no  sophisticated  training  algorithms  were  required. 
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From  the  test  set  results,  we  computed  the  following  measures  of  effectiveness  for  each 
vibrating  system:  probability  of  false  alarm,  probability  of  missed  detections,  probability  of 
misclassification,  and  probability  of  deferral.  Probability  of  false  alarm  is  the  probability  that  a 
fault  is  announced  when  there  is  no  fault  present.  Probability  of  a  miss  detection  is  the  probability 
that  no  fault  is  announced  when  there  is  a  fault  present  Probability  of  misclassification  is  the 
probability  that  a  fault  type  is  announced  when  a  different  fault  type  is  present  Probability  of 
deferred  is  the  probability  that  the  classifier  defers  a  decision  when  a  case  for  decision  (a  feature 
vector)  is  presented  to  it 

For  the  purpose  of  this  work,  a  feature  vector  leads  to  an  ambiguous  situation  when  the 
absolute  difference  between  the  two  largest  competing  outputs  is  less  than  some  specified 
tolerance.  In  these  cases,  the  classifier  refuses  to  announce  a  decision  and  considers  the  next 
feature  vector.  The  consequence  of  this  deferral  is  to  decrease  the  probabilities  of  false  alarm  and 
missed  detections,  and  to  increase  the  time  delay  for  a  classifier  decision.  For  instance,  feature 
vectors  for  the  gearbox  system  are  computed  every  10  msec,  so  the  price  paid  in  time  delay  for 
each  deferral  (or  rejection)  is  only  10  msec. 

The  computation  rate  of  feature  vectors  depend  on  the  time  constants  selected  for  the 
wavelet  preprocessor.  Extracting  feature  vectors  at  a  period  exceeding  the  largest  of  these  time 
constants  leads  to  statistical  independence.  This  period  can  be  quite  short.  Table  2-5  presents  the 
maximum  time  constants  and  number  of  feature  vectors  per  second  allowed  by  such  time  constants 
for  the  three  vibrating  systems  of  Phase  I. 

We  deliberately  avoided  using  all  available  techniques  to  solve  the  Phase  I  problem.  We 
arbitrarily  limited  ourselves  to  single-channel  processing,  with  instantaneous  classification,  in 
order  to  have  some  additional  processing  techniques  available  to  deal  with  expected  additional 
complications  contained  in  real  data. 
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TABLE  2-5.  PREPROCESSOR  TIME  CONSTANTS  AND  FEATURE  VECTOR  RATES 


UNIT 

MAX  TIME  CONCTANT 

FEATURE  VECTORS  PER 
SECOND 

Helicopter  gearbox 

10  msec 

100 

Condensate  pump 

25  msec 

40 

Fire  pump 

25  msec 

40 

One  of  these  techniques  is  temporal  fusion.  We  compute  feature  vectors  as  fast  as  possible 
while  maintaining  statistical  independence  (i.e.,  at  a  rate  limited  by  the  longest  time  constant  in  the 
preprocessor).  For  the  gearbox  data,  we  computed  feature  vectors  every  10  msec,  and  classified 
each  and  every  one.  To  robustify  ihe  classification  process  against  transient  disturbances,  we  can 
simply  compaie  output  classifications  over  a  window  of,  say,  100  feature  samples  (one  second  of 
data).  If,  say,  fewer  than  95  of  the  classifications  agree,  we  assume  a  disturbance  (and  not  a  fault) 
is  present  This  dramatically  reduces  the  theoretical  probability  of  false  alarm,  at  the  cost  of  an 
additional  second  of  delay  in  producing  a  warning — a  very  attractive  tradeoff  for  most  situations. 

2.6  FAULT  DETECTION  AND  IDENTMCATION  RESULTS  FROM  PHASE  I 

Given  the  preceding  insight  into  the  derivation  of  high-energy  wavelet  features  and  the 
convex,  separable  clusters  they  form  in  feature  space,  it  should  be  no  surprise  that  good 
classification  results  are  possible.  Providing  a  feature  set  that  captures  the  important  discriminants 
between  normal  operation  and  faults  vastly  simplifies  the  problem  of  designing  an  adaptive 
classifier  that  achieves  good  performance. 

The  performance  results  for  each  of  the  test  systems  are  presented  in  Table  2-6.  The 
acceptance  threshold  is  the  ratio  between  the  maximum  output  value  and  the  next  larger  output 
value  for  a  given  feature  vector.  The  complexity  value  is  the  total  number  of  PEs  in  the 
corresponding  ANN.  For  the  gearbox  and  the  condensate  pumps  systems,  the  test  set  was 
independent  from  the  training  set;  for  the  fire  pumps  data  these  two  sets  were  the  same. 
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TABLE  2-6.  PHASE  I  PERFORMANCE  RESULTS 


GEARBOX 

CONDENSATE 

PUMP 

FIRE  PUMP 

TRAINING  SET  SIZE 

1125 

240 

480 

TEST  SET  SIZE 

6750 

1400 

4800 

ACCEPTANCE  THRESHOLD 

1.4 

1.2 

2.0 

PROBABILITY  OF  FALSE  ALARM 

0.000 

0.000 

0.000 

PROBABILITY  OF  MISSED  DETECTION 

0.000 

0.000 

0.000 

PROBABILITY  OF  DEFERRAL 

0.035 

0.020 

0.020 

PROBABILITY  OF  MISCLASSIFICATION 

0.046 

0.000 

0.000 

COMPLEXITY 

27  PEs 

28  PEs 

45  PEs 

The  performance  results  in  Table  2-6  clearly  show  that  the  wavelet  feature  sets  selected 
above  permit  perfect  detection  performance  with  low  d^erral  rates.  While  these  results  are 
pleasing,  we  feel  that  an  even  more  important  principle  has  been  demonstrated.  We  used  exactly 
the  same  method  to  find  features  for  the  pumps  as  we  used  for  the  gearbox  data.  There  was  no 
trial  and  error  for  the  pump  classifiers — these  results  are  from  the  very  first  feature  sets  we  picked. 
This  offers  limited  but  important  evidence  that  our  results  are  not  accidental — that  we  have  a 
methodology  to  analyze  data  from  vibrating  systems  and  derive  small,  focused  feature  sets  that 
support  high-confidence  fault  detection  and  classification. 
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SECTIONS 

CONCLUSIONS  AND  RECOMMENDATIONS 

3.1  CONCLUSIONS  FROM  THE  PHASE  I  EFFORT 

Our  Phase  I  performance  results  speak  for  themselves.  Incipient  fault  detection  and 
classification  on  bench-test  and  mild-operation  data  is  quite  feasible,  even  without  exploiting  many 
additional  techniques  available.  The  CWT  provides  images  from  which  feature  selection  is  easy. 
These  features  are  simple  and  robust  (high  energy,  narrow  bandwidth). 

There  are  no  technological  impediments  to  practical  implementation  The  selected  wavelet 
features  can  be  computed  using  off-the-shelf  digital  filtering  hardware.  ANNs  can  be  trained  and 
employed  using  off-the-shelf  techniques. 

The  helicopter  gearbox  data  used  was  bench  test  data.  Bench  test  data  cannot  support  a 
complete  characterization  of  normal  operating  regimes  and  cannot  include  a  complete  set  of 
disturbances  to  be  encountered  in  the  field.  Moreover,  seeded  fault  data  cannot  span  the  complete 
set  of  possible  failures,  and  may  contain  artifacts  that  assist  detection.  In  summary,  bench  tests  are 
not  reality.  Reality  is  much  less  controlled,  and  hence  much  less  predictable.  Although  the  pump 
data  were  obtained  under  actual  operating  conditions,  these  conditions  were  relatively  benign  and 
do  not  include  all  possible  operating  conditions.  Therefore,  the  principal  conclusion  we  draw  from 
Phase  I  is  to  greet  these  results  (or  any  Phase  I  results)  with  skepticism  concerning  their 
applicability  to  field  systems,  and  that  any  Phase  U  effort  must  focus  on  support  for  analysis  and 
design  with  real  data  under  widely  varied  operating  regimes. 

We  are  well  positioned  to  make  the  transition  to  real  data  in  Phase  11  because  of  the 
methodology  we  established  in  Phase  I.  We  have  a  specific  procedure  for  selecting  low¬ 
dimensional,  robust  feature  sets — and  have  demonstrated  its  efficacy  on  the  two  pump  data  sets. 
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We  deliberately  avoided  using  all  available  processing  techniques  to  solve  the  Phase  I 
problem — we  wanted  to  be  sure  that  a  clean  problem  could  be  solved  with  simple  techniques.  We 
arbitrarily  limited  ourselves  to  single-charmel  processing,  with  instantaneous  classification.  This 
makes  available  some  additional  processing  techniques  to  meet  the  challenges  posed  by  using  real 
data,  and  we  propose  to  exploit  them  to  the  fullest  in  Phase  II. 

3.2  PHASE  n  RECOMMENDATIONS 

Phase  II  must  address  two  classes  of  issues:  1)  continued  development  of  integrated 
wavelet/ANN  techniques  for  incipient  fault  detection,  and  2)  preparation  for  transition  to  Phase  HI 
through  the  development  of:  a)  a  generic  off-line  software  design  suite  and  b)  a  hardware,  real¬ 
time  feature  extraction  capability. 

While  Phase  I  demonstrated  the  feasibility  of  achieving  good  fault  detection  and 
classification  performance,  the  methods  employed  were  neither  terribly  efficient  nor  able  to 
incorporate  all  sources  of  information.  The  major  technical  issues  to  be  resolved  relate  to  these  two 
areas — making  the  feature  selection  more  efficient,  and  being  able  to  fuse  other  sources  of  failure 
information  (e.g.,  multichannel  data)  into  the  classifier. 

To  achieve  efficiency,  one  must  integrate  the  software  tools  used  to  develop  the  Phase  I 
results  (CWT  routines,  statistical  analysis  of  candidate  features,  creation  and  editing  feature  sets, 
ANN  training  and  testing,  ANN  performance  characterization,  and  deferral  threshold  trades),  and 
extend  the  technological  bases  of  several  of  them.  Phase  n  must  consolidate  all  of  the  requisite 
functionality  currently  provided  by  multiple  software  packages  on  two  hardware  platforms  into  a 
single  design  package.  KHOROS  provides  a  perfectly  acceptable  shell  for  all  of  these  functions. 
The  objective  of  Phase  II  in  this  area  is  thus  to  migrate  all  of  these  functions,  along  with  any  new 
ones  developed  in  the  process  of  resolving  the  technical  issues  raised  below,  into  a  unified  wavelet 
feature  analysis  and  selection  software  environment. 

Some  of  the  technical  areas  demanding  additional  research  attention  include  the  following: 
Can  one  perform  the  statistical  evaluation  of  wavelet  features  directly  in  wavelet-transform  space 
(i.e.,  for  all  candidate  features)  rather  than  for  selected  features  in  a  post-transform  analysis?  Can 
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one  apply  advanced  multiscale  estimation  techniques  to  extract  different  features  at  different  time 
scales?  How  effective  might  be  some  additional  wavelet  feature  types  (e.g.,  wavelet  packet 
coefficients  for  switching  systems,  or  multiscale  autoregressive  model  identification)?  Can  recent 
work  on  the  interpretation  of  ANN  outputs  as  likelihood  functions  be  directly  linked  to  the 
statistical  performance  analysis  currently  done  in  feature  space? 

In  addition,  one  is  not  limited  to  single-channel  accelerometer  data  in  many  potential 
applications.  How  can  one  process  additional  channels?  Is  there  advantage  to  using  a  vector 
wavelet  transform  to  process  all  channels  simultaneously?  Can  any  statistical  techniques  derived 
for  use  in  wavelet  transform  space  be  adapted  to  vector  transforms?  Is  there  a  need  for  different 
techniques  on  different  channels?  To  what  extent  can  accelerometers  mounted  on  the  supports  of  a 
vibrating  system,  rather  than  on  its  casing,  supply  information  about  environmental  vibrations  and 
disturbances?  How  should  this  information  influence  the  feature- selection  process? 

Finally,  there  are  several  issues  related  to  incorporating  incipient  failure  detectors  into  a 
genuine  Navy  maintenance  concept  What  are  the  relative  merits  of  detection  alone  vs.  detection 
and  classification?  What  are  acceptable  false  alarm  rates?  Are  there  other  meaningful  outputs  from 
an  incipient  fault  detector  (e.g.,  rate  of  development  of  an  anomaly)  that  can  be  useful  to  an 
operator?  And,  of  course,  what  response  time  is  required,  and  how  can  that  time  be  used  to 
process  a  series  of  feature  vectors  in  order  to  reduce  false  alarms  and  increase  detection  reliability. 

Transition  to  Phase  HI  demands  a  generic  product  that  can  select  and  extract  wavelet 
features  for  any  specific  application.  This  product  must  consist  of  two  parts.  The  first  is  a  design 
capability,  which  an  engineer  can  use  to  select  feature  sets  relevant  to  any  particular  application. 
The  second  is  a  real-time  feature  extractor,  which  (after  setting  some  parameters  to  values 
determined  during  the  design  effort)  will  produce  digitally  sampled  feature  vectors  in  real  time. 

Phase  II  must  also  address  computational  efficiency.  The  wavelet  feature  extractor  in 
Phase  I  executed  about  100  times  slower  than  real  time.  Most  of  the  functions  are  simple  filtering 
operations,  and  many  can  be  done  in  parallel  (one  wavelet  feature  per  channel).  Since  the  feature 
extraction  relies  on  one  generic  set  of  computations  for  each  feature,  parameterized  to  suit  a 
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particular  application,  now  is  the  time  to  migrate  these  functions  into  simple,  generic  hardware. 
Design  options  range  from  the  use  of  standard  DSP  chips  to  a  hybrid  analog  front- 
end/microprocessor  backend.  Since  accelerometers  (and  most  foreseeable  other  sensors)  supply 
data  at  audio  frequencies,  there  should  be  no  need  to  push  the  state  of  the  hardware  art  in  signal 
processing  hardware  design — off-the-shelf  components  should  easily  provide  the  necessary 
performance  and  physical  reliability. 

Thus,  Phase  n  has  the  central  objective  of  producing  a  generic  capability  to  rapidly  design 
and  implement  incipient  failure  detectors  for  a  wide  range  of  Navy  applications.  This  capability 
should  consist  of  an  off-line  design  software  suite  for  feature  selection,  ANN  training,  and 
performance  evaluation.  It  should  also  include  a  hardware  element  that  can  be  tuned  to  extract  a 
range  of  wavelet  features,  in  real  time,  so  that  the  same  hardware  element  can  be  used  for  a  variety 
of  applications. 

Having  a  capability  to  demonstrate  incipient  fault  detection  is  not  enough,  however.  There 
must  be  ample  evidence  that  the  capability  delivers  products  that  work.  Therefore,  a  final  objective 
of  Phase  n  must  be  not  only  to  develop  the  wavelet  feature  selection  and  extraction  capability,  but 
to  demonstrate  it  on  large  volumes  of  real  data. 
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APPENDIX  A 

THE  CONTINUOUS  WAVELET  TRANSFORM 

This  appendix  presents  the  basics  of  the  Continuous  Wavelet  Transform,  which  underlies 
our  methodology  for  selecting  features. 

A.l  WAVELET  BASES 

Wavelets  are  a  new  approach  to  an  old  problem:  building  complicated  functions  out  of 
simple  elements.  Fourier  analysis  builds  complicated  functions  out  of  sine  and  cosine  functions. 
Wavelets  use  functions  with  limited  time  extent,  so  that  they  can  better  represent  time-localized 
aspects  of  a  function  than  can  a  sum  of  infinite-duration  sines  and  cosines. 

All  wavelet  analyses  are  based  on  dilations,  contractions,  and  translations  of  a  basic  mother 
wavelet.  If  the  dilations,  contractions,  and  translations  are  orthogonal  to  one  another,  then  the 
wavelets  form  an  orthonormal  basis  which  is  complete  in  many  cases.  One  unique  characteristic  of 
wavelet  analysis  is  that  there  are  an  infinite  number  of  choices  for  a  mother  wavelet,  allowing  one 
to  continuously  vary  the  tradeoff  between  time  and  frequency  localization.  Figure  A-1  shows  the 
Haar  wavelet,  on  the  left,  and  the  Kiang  wavelet,  on  the  right  Figtire  A-2  shows  dilated  and 
contracted  translates  of  the  Haar  wavelet. 


Figure  A-1.  The  Haar  and  Kiang  Wavelets 

In  its  most  general  form,  a  mother  wavelet  is  any  essentially  time-  and  band-limited 
function  of  t,  subject  to  the  uncertainty  principle  limitations.  (The  uncertainty  principle  states  that 
good  time  localization  is  obtained  at  the  expense  of  frequency  localization,  and  vice  versa.) 
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Figure  A-2.  Dilated  Translates  of  the  Haar  Wavelet 


An  affine  wavelet  family  is  obtained  from  dilations  and  translations  of  the  mother  wavelet. 
Affine  wavelets  can  bt  either  continuous  or  discrete  (usually  dyadic).  Dyadic  wavelet  families 
dilate  by  powers  of  two,  and  translate  by  integral  multiples  of  power  of  two.  Continuous  wavelet 
families  dilate  by  arbitrary  scale  factors,  and  translate  by  arbitrary  amounts. 

A.2  WAVELETS  AND  SPECTRA 

Each  mother  wavelet  has  a  characteristic  footprint  in  time/frequency  space — a  region  where 
it  is  sensitive  to  energy.  Dilate  it  (or  contract  it)  by  a  factor  of  two,  and  this  region  expands  (or 
contracts)  by  a  factor  of  two  along  the  time  axis,  and  translates  by  an  octave  down  (or  up)  along 
the  frequency  axis.  Translate  it,  and  the  footprint  moves  an  equal  amount  along  the  time  axis. 
Take  a  set  of  dilations,  contractions,  and  translations  that  cover  all  of  time/frequency  space,  and 
you  have  a  complete  basis  set.  Figure  A-3  illustrates  a  typical  subdivision  by  wavelets  of  the  time- 
frequency  plane. 


Anatcovwvd  by 
diffarant  wavalat 
basia  functkxis 


Tima  locabMIon 

Figure  A-3.  Typical  Subdivision  by  Wavelets  of  the  Time-Frequency  Plane 
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The  exciting  discoveries  that  kindled  the  recent  interest  in  wavelets  concern  the  existence  of 
such  complete  bases  that  are  also  orthonormal,  and  for  which  the  mother  wavelet  has  finite  support 
(is  non-zero  over  a  finite  length  of  time). 

A.3  WAVELET  TRANSFORMS 

Because  there  are  an  infinite  number  of  mother  wavelets,  there  are  an  infinite  number  of 
wavelet  transforms.  By  varying  the  choice  of  the  mother  wavelet,  one  can  get  wavelet  transforms 
that  differ  in  both  time  and  frequency  localization. 

An  affine  wavelet  transform  is  the  set  of  coefficients  that  multiply  elements  of  a  wavelet 
family  in  order  to  represent  a  time  function.  Affine  wavelet  transforms  can  be  either  continuous  or 
discrete.  Continuous  wavelet  transforms  yield  a  complex-valued  function  of  time  and  space 
(frequency).  Discrete  (usually  dyadic)  wavelet  transforms  yield  a  finite  number  of  coefficients  for 
an  essentially  bounded  region  of  time  and  scale  (frequency).  All  wavelet  transforms  contain 
implicit  or  explicit  feature  detectors  (one  for  each  basis  function).  Figure  A-4  presents  a 
classification  of  affine  and  non-affine  wavelets  based  on  their  support  (columns)  and  extent  of  their 
filter  realizations  (rows). 
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Figure  A-4.  Classification  of  Wavelets 

Wavelet  selection  affects  preprocessor  design.  Efficient  real-time  feature  extraction 
imposes  constraints  on  possible  mother  wavelets.  We  require  causality  and  a  finite  dimensional 
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realization  of  the  wavelet  generator.  In  this  effort,  we  constrained  the  choice  of  mother  wavelet 
based  on  our  need  to  implement  a  feature  extractor  without  new  leaps  forward  in  electronics 
technology.  In  particular,  we  limited  our  choices  to  those  for  which  corresponding  filters  have  a 
finite  dimensional  realization.  All  of  the  wavelets  with  finite  support  possess  this  property,  but 
must  be  very  long  in  order  to  localize  the  narrowband  energy  so  obvious  in  the  gearbox  data. 
Therefore,  we  selected  a  wavelet  with  semi-infinite  support,  so  that  it  has  a  causal  realization,  and  a 
low-dimensional  implementation.  This  wavelet  was  derived  from  auditory  nerve  response  data 
collected  by  Kiang  in  the  1960s,  and  we  have  honored  him  by  appropriating  his  name  for  the 
wavelet. 

A.4  SIMPLE  EXAMPLES  OF  CONTINUOUS  WAVELET  TRANSFORMS 

This  section  presents  color  images  of  Continuous  Wavelet  Transforms  (CWTs)  of  some 
basic  functions  to  give  the  reader  greater  insight  into  the  time-fi’equency  representation  of  signals 
provided  by  the  CWT  based  on  the  Kiang  wavelet.  In  these  images,  the  frequency  scale  is 
logarithmic  and  runs  vertically,  increasing  to  the  top,  with  frequency  divisions  corresponding  to 
octaves  from  16  Hz  to  16  kHz.  The  time  scale  runs  horizontally,  increasing  to  the  right,  with  time 
divisions  62.5  msec  wide.  Hue  encodes  the  log  magnitude  of  the  CWT  (blue  =  low,  red  =  high). 
Phase  information  is  ignored. 

A,4,l  Pulse  and  Sine  Wave 

The  left  image  in  Plate  K  is  a  wavelet  transform,  using  the  Kiang  wavelet,  of  a  1  msec 
pulse  sampled  at  48  kHz.  Note  the  nulls  at  multiples  of  1  kHz,  reflecting  the  fact  that  the  Kiang 
wavelet  is  highly  oscillatory  and  thus  almost  orthogonal  to  pulses  that  extend  over  an  integral 
number  of  complete  cycles.  Note  that  the  finer  scales  allow  very  precise  placement  of  the  time  that 
the  pulse  occurs,  and  the  absence  of  windowing  effects. 

The  right  image  is  a  wavelet  transform  of  a  stepped  sine  wave.  Note  that  initial  transients 
quickly  give  way  to  a  very  tight  localization  of  the  center  frequency  of  the  sine.  Note  also  that  the 
transients  decay  more  slowly  for  the  coarser  scales  at  the  right  of  the  figure. 
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A.4.2  Superposition  Examples 

The  left  image  in  Plate  L  is  a  wavelet  transform  of  the  sum  of  the  two  functions  in  Plate  K. 

It  provides  clear  evidence  that  a  single  transform  can  localize  transient  events  well  in  time,  whUe  at 
the  same  time  localizing  stationary  frequencies. 

The  right  image  is  a  wavelet  transform  of  two  stepped  sine  waves  of  identical  amplitude 
and  similar  frequency.  It  provides  a  dramatic  visualization  of  beat  effects,  as  the  pattern  of  the 
peak  is  a  periodic  variation  between  two  separate,  lower-energy  signals  and  a  single,  higher-energy 
signal.  As  the  difference  between  center  frequencies  increases,  proportionally  more  of  each  beat 
cycle  is  occupied  by  the  area  with  two  distinct  frequency  peaks.  Note  that  the  detail  of  the  beat 
structure  is  not  typically  present  in  sonograms,  lofargrams,  or  waterfall  displays — and  it  is  this 
visibility  into  time/frequency  variations  of  signal  structure  that  is  the  advantage  of  the  CWT. 

A.4.3  Noise  Examples 

The  left  image  in  Plate  M  is  a  wavelet  transform  of  a  white  Poisson  process  emitting  1- 
msec  pulses,  with  a  mean  interarrival  time  of  10  msec.  Note  the  clear  separation  between  pulses 
apparent  at  the  finer  scales  at  the  left  of  the  image,  in  contrast  to  the  random  texture  at  lower  scales 
at  the  right.  There  is  clearly  no  structure  to  this  signal  that  is  localized  in  frequency. 

The  right  image  is  a  wavelet  transform  of  a  white  Gaussian  process.  The  apparent 
concentration  of  energy  at  the  finer  scales  (higher  frequencies)  is  due  to  the  fact  that  the  bandwidths 
of  wavelets  inevitably  increase  as  they  are  contracted.  Thus  a  process  with  equal  energy  at  each 
frequency  in  a  Fourier  transform  has  exponentially  increasing  power  as  frequency  increases  in  a 
wavelet  transform. 
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